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SPEECH DATA COMPRESSION/EXPANSION 
APPARATUS AND METHOD 



BACKGROUND OF THE INVENTION 
5 1. Field of the Invention 

The present invention relates to a compression apparatus for 
compressing waveform dictionary data composed of speech waveform data 
used for speech synthesis to create a compressed dictionary, and an expansion 
apparatus for expanding compressed data of the compressed dictionary. 

10 

2. Description of the Related Art 

Due to the recent rapid development of computer technology, speech 
synthesis technology, of which use has conventionally been limited to the 
particular field, is becoming apphcable to various fields. Along with this, 
15 there is an increasing demand for high quality speech reproduction in speech 
synthesis. 

In order to reahze high quality speech synthesis, it is required to 
prepare a large amount of sound waveform data that is a relatively large 
capacity of data, which results in large consumption of computer resources 
20 such as a storage device (e.g., a disk). Thus, various methods for 
compressing such sound waveform data have been considered. 

For example. Figure 1 is a view showing the principle of a 
compression/expansion apparatus that has often been used. In Figure 1, 
reference numeral 11 denotes a dictionary data input part, 12 denotes a 
25 dictionary data compression part, 13 denotes a compressed dictionary data 

storing part, 14 denotes a speech dictionary database, 15 denotes a dictionary 
data expansion part, and 16 denotes an expanded waveform data output part. 

In Figure 1, the dictionary data is composed of waveform data 111, a 
phoneme label, and pitch information 113. In such a conventional 
30 compression/expansion apparatus, only the waveform data 111 is compressed 
and expanded. Thus, in the dictionary data compression part 12, the input 
waveform data 111 is compressed, and stored in the speech dictionary 
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database 14 by the compressed dictionary data storing part 13. 

The compressed waveform data stored in the speech dictionary 
database 14 is expanded in the dictionary data expansion part 15 during 
speech synthesis, and reproduced in the expanded waveform data output 
5 part 16. 

However, according to the above-mentioned compression/expansion 
method, conventional waveform data is compressed as it is. Therefore, in the 
case where waveform data in the original dictionary is not configured in a 
phoneme unit, but in a corpus unit, it is difficult to determine which portion of 

10 the corpus a phoneme or a syllable to be used for speech synthesis corresponds 
to and it is required to expand all the data compressed in a corpus unit. This 
requires a considerable period of time for expansion, and makes it difficult to 
perform speech synthesis in real time. 

Furthermore, in the case where compressed speech waveform data is 

15 expanded for speech synthesis, an SNR is likely to decrease in a rising portion 
of speech synthesis, so that it is difficult to perform high quality reproduction. 

SUMMARY OF THE INVENTION 

Therefore, with the foregoing in mind, it is an object of the present 
20 invention to provide a speech data compression/expansion apparatus and 

method for correcting a compression position and an expansion position in 

waveform data, thereby ensuring a real time property of speech synthesis and 

realizing high quality speech synthesis. 

In order to achieve the above-mentioned object, a speech data 
25 compression/expansion apparatus of the present invention includes: a 

dictionary data input part for extracting speech data containing waveform 

data from an existing speech waveform dictionary and inputting the extracted 

speech data; 

a compression position determining part for specifying a part used for 
30 speech synthesis in the waveform data, and setting a starting point and an 
ending point for compression before and after the part; 

a dictionary data compression part for compressing the waveform data 
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with respect to a compression interval specified by the starting point and the 
ending point for compression; and a dictionary data expansion part for 
expanding the compressed waveform data, 

wherein the specified compression interval, in which an expansion 
5 result of the compressed waveform data has highest quality, is determined as 
a compression/expansion position, and the compressed waveform data, and 
the staring point and the ending point for compression are registered in a 
database as the waveform data used for speech synthesis. 

Because of the above structure, a compression position in the 

10 waveform data can be arbitrarily determined, and the capacity of waveform 
data to be compressed can be minimized to a required capacity. Therefore, an 
expansion time can be shortened, and a real time property during speech 
synthesis can be ensured. 

Furthermore, in the speech data compression/expansion apparatus of 

15 the present invention, it is preferable that, in the compression position 

determining part, the part used for speech synthesis in the waveform data is 
specified, and the starting point and the ending point for compression are 
provisionally set before and after the part. It is also preferable that the 
apparatus further includes: a dictionary data compression part for 

20 compressing the waveform data with respect to the specified compression 
interval; a dictionary data expansion part for expanding the compressed 
waveform data; and an SNR calculating part for calculating an SNR with 
respect to the expanded waveform data, and the specified compression 
interval, having a highest SNR, is determined as a compression/expansion 

25 position, and the compressed waveform data is registered in a database as the 
waveform data used for speech synthesis. 

Because of the above structure, a compression position in the 
waveform data can be determined based on a position having the highest SNR 
during speech synthesis, high quality speech synthesis can be performed, and 

30 the capacity of waveform data to be compressed can be minimized to a 

required capacity. Therefore, an expansion time can be shortened, and a real 
time property of speech synthesis can be ensured. 
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Furthermore, it is preferable that the speech data 
compression/expansion apparatus of the present invention further includes an 
expansion position determining part for setting a starting point and an ending 
point for expansion before and after the compressed waveform data registered 
5 in a database as the waveform data used for speech synthesis. This is 
because an expansion position in the waveform data can be arbitrarily 
determined, and high quality speech synthesis can be performed. 

Furthermore, it is preferable that, in the compression position 
determining part, the starting point and the ending point for compression are 

10 determined in a pitch unit. Furthermore, it is preferable that, in the 

compression position determining part, the starting point and the ending 
point for compression are determined in a frame unit. This is because a 
starting point and an ending point for compression can be easily specified. 

Next, in order to achieve the above-mentioned object, the speech data 

15 expansion apparatus of the present invention is characterized in that the 
waveform data compressed by the above-mentioned speech data 
compression/expansion apparatus of the present invention stored in a 
database is expanded. 

Because of the above structure, using a database storing compressed 

20 waveform data, waveform data having a large population can be held, and 
appropriate waveform data can be selected therefi-om and expanded. Thus, 
by using a speech data expansion apparatus of the present invention, a speech 
synthesis apparatus of higher quality can be constituted. 

Next, in order to achieve the above object, a speech data 

25 compression/expansion apparatus of the present invention includes: a 

dictionary data input part for extracting speech data containing waveform 
data from an existing speech waveform dictionary and inputting the extracted 
speech data; a compression position determining part for specifying a part 
used for speech synthesis in the waveform data, and determining a 

30 compression position containing the part; a dictionary data compression part 
for compressing the waveform data with respect to the compression position; 
an expansion position determining part for setting a starting point and an 
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ending point for expansion before and after the compressed waveform data; 
and a dictionary data expansion part for expanding the compressed waveform 
data with respect to an expansion interval specified by the starting point and 
the ending point for expansion, wherein the specified expansion interval, in 
5 which an expansion result of the compressed waveform data has highest 

quality, is determined as an expansion position, and the compressed waveform 
data, and the starting point and the ending point for expansion are registered 
in a database as the waveform data used for speech synthesis. 

Because of the above structure, an expansion position in the waveform 
10 data can be arbitrarily determined, and the capacity of waveform data to be 
expanded can be minimized to a required capacity. Therefore, an expansion 
time can be shortened, and a real time property of speech synthesis can be 
ensured. 

Next, in order to achieve the above object, a speech data expansion 

15 apparatus of the present invention is characterized in that the waveform data 
in which the expansion interval is determined by the above-mentioned speech 
data compression/expansion apparatus of the present invention stored in a 
database is expanded. 

Because of the above structure, using a database storing compressed 

20 waveform data, waveform data having a large population can be held, 

appropriate waveform data can be selected therefcom and expanded, and 
waveform data having higher expansion quality can be used. Thus, by using 
a speech data expansion apparatus of the present invention, a speech 
synthesis apparatus of higher quahty can be constituted. 

25 Furthermore, in the speech data compression/expansion apparatus of 

the present invention, it is preferable that, in the expansion position 
determining part, the starting point and the ending point for expansion are 
provisionally set before and after the compressed waveform data. It is also 
preferable that the apparatus further includes: a dictionary data expansion 

30 part for expanding the compressed waveform data with respect to the 

specified expansion interval; and an SNR calculating part for calculating an 
SNR with respect to the expanded waveform data, wherein the specified 
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expansion interval, having a highest SNR, is determined as an expansion 
position. This is because an expansion position in the compressed waveform 
data can be determined based on a position having a high SNR during speech 
synthesis, and high quahty speech synthesis can be performed. 
5 Furthermore, it is preferable that, in the expansion position 

determining part, the starting point and the ending point for expansion are 
determined in a pitch unit. Furthermore, it is preferable that, in the 
expansion position determining part, the ending point for expansion is 
determined based on the number of bytes for bit filHng and the starting point. 

10 This is because a starting point and an ending point for expansion of the 
compressed waveform data can easily be specified. 

Next, in order to achieve the above object, a speech data expansion 
system of the present invention is characterized in that the waveform data 
compressed by the above-mentioned speech data compression/expansion 

15 apparatus of the present invention stored in a database is expanded. 

Because of the above structure, using a database storing compressed 
waveform data, waveform data having a large population can be held, and 
appropriate waveform data can be selected therefrom and expanded. Thus, 
by using a speech data expansion apparatus of the present invention, a speech 

20 synthesis apparatus of higher quality can be constituted. 

Next, in order to achieve the above object, a speech data expansion 
system of the present invention is characterized in that the waveform data in 
which the expansion interval is determined by the above-mentioned speech 
data compression/expansion apparatus of the present invention stored in a 

25 database is expanded. 

Because of the above structure, using a database storing compressed 
waveform data, waveform data having a large population can be held, 
appropriate waveform data can be selected therefrom and expanded, and 
waveform data having higher expansion quahty can be used. Thus, by using 

30 a speech data expansion apparatus of the present invention, a speech 
synthesis apparatus of higher quality can be constituted. 

Furthermore, the present invention is characterized by software 
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executed so as to perform the functions of the above-mentioned speech data 
compression/expansion apparatus as processing steps of a computer. More 
specifically, the present invention is characterized by a method including: 
extracting speech data containing waveform data from an existing speech 
5 waveform dictionary and inputting the extracted speech data; specifying a 
part used for speech synthesis in the waveform data, and setting a starting 
point and an ending point for compression before and after the part; 
compressing the waveform data with respect to a compression interval 
specified by the starting point and the ending point for compression; and 

10 expanding the compressed waveform data, wherein the specified compression 
interval, in which an expansion result of the compressed waveform data has 
highest quality, is determined as a compression/expansion position, and the 
compressed waveform data, and the starting point and the ending point for 
compression are registered in a database as the waveform data used for 

15 speech synthesis. The present invention is also characterized by a 

computer-readable recording medium storing these operations as a program. 

Because of the above structure, the program is loaded onto a computer 
so as to be executed, whereby a compression position in the waveform data 
can be arbitrarily determined, and the capacity of the waveform data to be 

20 compressed can be minimized to a reqxiired capacity. Therefore, a speech 

data compression/expansion apparatus can be realized, which can shorten an 
expansion time and ensure a real time property of speech synthesis. 

Furthermore, the present invention is characterized by software 
executed so as to perform the functions of the above-mentioned speech data 

25 compression/expansion apparatus as processing steps of a computer. More 
specifically, the present invention is characterized by a method including: 
extracting speech data containing waveform data fi:om an existing speech 
waveform dictionary and inputting the extracted speech data; specifying a 
part used for speech synthesis in the waveform data, and determining a 

30 compression interval including the part; compressing the waveform data with 
respect to the compression interval; setting a starting point and an ending 
point for expansion before and after the compressed waveform data; and 
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expanding the compressed waveform data with respect to an expansion 
interval specified by the starting point and the ending point for expansion, 
wherein the specified expansion interval, in which an expansion result of the 
compressed waveform data has highest quality, is determined as an expansion 
5 position, and the compressed waveform data, and the starting point and the 
ending point for expansion are registered in a database as the waveform data 
used for speech synthesis. The present invention is also characterized by a 
computer-readable recording medium storing these operations as a program. 
Because of the above structure, by loading the program onto a 

10 computer so as to be executed, more appropriate waveform data can be 
selected from waveform data having a large population, so that a speech 
synthesis apparatus of higher quality can be realized. 

These and other advantages of the present invention will become 
apparent to those skilled in the art upon reading and understanding the 

15 following detailed description with reference to the accompanying figures. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a block diagram of a conventional speech data 
compression/expansion apparatus 
20 Figure 2 is a block diagram of a speech data compression/expansion 

apparatus in an embodiment of the present invention. 

Figure 3 is a block diagram showing an example of a speech data 
compression/expansion apparatus in the present embodiment. 

Figure 4 is a block diagram showing another example of a speech data 
25 compression/expansion apparatus in the present embodiment. 

Figure 5 is a block diagram illustrating speech synthesis in a speech 
data compression/expansion apparatus in an embodiment of the present 
invention. 

Figure 6 is a block diagram showing an example of a speech data 
30 compression/expansion apparatus of the present invention. 

Figure 7 is a block diagram showing another example of a speech data 
compression/expansion apparatus of the present invention. 



Figure 8 is a flow chart illustrating the processing in a speech data 
compression/expansion apparatus in an embodiment of the present invention. 
Figure 9 illustrates a recording medium. 



5 DESCRIPTION OF THE PREFERRED EMBODIMENTS 

Hereinafter, a speech data compression/expansion apparatus in an 
embodiment of the present invention will be described with reference to the 
drawings. Figure 2 is a block diagram showing the principle of the speech 
data compression/expansion apparatus in the present embodiment. In 

10 Figure 2, reference numeral 21 denotes a compressed dictionary data storing 
part, 22 denotes a compression position determining part, 23 denotes an 
expansion position determining part, and 24 denotes an SNR calculating part. 

As shown in Figure 2, dictionary data is composed of waveform 
data 111, a phoneme label 112, and pitch information 113, in the same way as 

15 in the conventional example shown in Figure 1. In the present embodiment, 
only the waveform data 111 is compressed and expanded in the same way as 
in the conventional compression/expansion apparatus. However, all the 
waveform data 111 is not compressed. A section to be compressed (i.e., a 
starting point and an ending point for compression) is set, and only the section 

20 is compressed. Thus, in the dictionary data compression part 12, the 
phoneme label 112 and the pitch information 113, as well as the input 
waveform data 111, are stored as information required for determining a 
compression position in the speech dictionary database 14 by the compressed 
dictionary data storing part 21. 

25 Various methods for determining a compression position are 

considered. First, it is considered that expansion is performed while a 
starting point and an ending point for compression is being changed, and a 
section having the highest SNR in a phoneme or syllable unit, based on an 
SNR measured in each case, is determined as a compression interval. In this 

30 case, a compression position cannot be determined at a time, and is 

determined by the processing in the compression position determining part 22 
as shown in Figure 3. Figure 3 illustrates an idea of waveform data 
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compression in the speech data compression/expansion apparatus in the 
present embodiment. In Figure 3, reference numeral 31 denotes waveform 
data to be compressed and 32 denotes additional data placed before and after 
the waveform data 31 to be compressed. 
5 Referring to Figure 3, in (a) showing the entire original waveform 

data, a starting point 33 and an ending point 34 of the waveform data 31 used 
for speech synthesis are determined. If the waveform data 31 is compressed 
as it is, it is difficult to maintain a high SNR in a rising portion of a speech 
during expansion. Therefore, a starting point and an ending point during 

10 compression are provisionally set before and after the waveform data 31 to be 
compressed. More specifically, the additional data 32 having an appropriate 
data length are included before and after the waveform data 31 used for 
speech synthesis, whereby a starting point 35 for compression and an ending 
point 36 for compression are provisionally set. A data length of the 

15 additional data 32 may be determined in a frame unit, or a sample unit or a 
pitch unit of a corpus, etc. 

As represented by (b), the waveform data 31 is compressed together 
with the additional data 32, and the waveform data 31 is expanded in the 
dictionary data expansion part 15 as represented by (c). The expanded 

20 waveform data 31 used for speech synthesis can be obtained, maintaining a 
high SNR, whereas a leading point of the additional data 32 has a low SNR 
due to the influence of noise. Thus, by deleting the additional data 32 while 
leaving a waveform data section 37 used for speech synthesis, expanded 
waveform data with a high SNR can be obtained. 

25 In the expansion position determining part 23, the starting point and 

the ending point of a part used for speech synthesis in the resultant expanded 
waveform data are matched with the starting point and the ending point of a 
section to be expanded. In the SNR calculating part 24, an SNR between the 
expanded waveform data and the original waveform data is calculated, and 

30 the calculated result is sent to the compression position determining part 22. 

In the compression position determining part 22, the above-mentioned 
processing is repeated while the starting point and the ending point during 
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compression are being changed to obtain the calculated results of an SNR, and 
a compression position with the highest SNR among the calculated results of 
an SNR is obtained to be stored as compression position information 144. 

A method for determining an ending point of a compression interval in 
a frame unit is also considered. In this case, in the compression position 
determining part 22, an ending point of a compression interval is determined, 
based on a frame unit in the dictionary data compression part 12. 

Furthermore, a method for deleting a silence interval from the 
original data to leave only a speech interval, and determining the speech 
interval as a compression interval is considered. In this case, in the 
compression position determining part 22, the silence interval is extracted 
and deleted from the phoneme label 112 and the pitch information 113, and 
the speech interval is determined as a compression interval. 

Furthermore, in order to exclude provisional setting of a compression 
position, the following methods are also considered: a method for compressing 
waveform data in a unit of the original data (i.e., in the case where waveform 
data is obtained in a corpus unit, the data is compressed in a corpus unit); a 
method for partitioning waveform data at an equal interval; a method in 
which a starting point of a compression interval is set several pitches before 
the part used for speech synthesis, based on the phoneme label 112 and the 
pitch information 113 of dictionary data; and the like. 

According to these methods, a compression position can be determined 
at a time in the compression position determining part 22. Therefore, a 
starting point and an ending point of a compression position determined in 
the compression position determining part 22 are stored in the speech 
dictionary database 14 as compressed waveform data 141. 

In the case where the waveform data used for speech synthesis is a 
part of the compressed waveform data, a section during expansion is 
determined in the expansion position determining part 23 and stored as 
expansion position information 145. 

Herein, roughly three methods for determining an expansion position 
can be considered as follows: a method in which expansion is conducted while 
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a starting point and an ending point of an expansion interval are being 
changed, and an interval with the highest SNR in a phoneme or syllable unit, 
based on an SNR measured in each case, is determined as an expansion 
interval; a method in which a starting point during expansion is automatically 
5 set several pitches before the part used for speech synthesis, based on the 
phoneme label and the pitch information; and a method in which an ending 
point of an expansion interval is automatically calculated based on the 
number of bytes for bit filling found from the expansion results and the 
starting position, thereby obtaining an expansion interval. 

10 First, according to the method in which expansion is conducted while 

a starting point and an ending point of an expansion interval are being 
changed, and an interval with the highest SNR in a phoneme or syllable unit, 
based on an SNR measured in each case, is determined as an expansion 
interval, an expansion position cannot be confirmed at a time, and is 

15 determined by conducting the processing in the expansion position 

determining part 23 as shown in Figure 4. Figure 4 illustrates an idea of 
waveform data expansion in the speech data compression/expansion 
apparatus in the present embodiment. In Figure 4, reference numeral 41 
denotes waveform data to be compressed and 42 denotes additional data 

20 placed before and after the compressed waveform data. 

In Figure 4, the waveform data used for speech synthesis is registered 
in the speech dictionary database 14 in a compressed state as represented by 
(b). If such compressed waveform data is expanded as it is, the entire 
original waveform data becomes as represented by (a). Therefore, there is a 

25 high possibility that a starting point 43 and an ending point 44 of the 

waveform data 41 used for speech synthesis will have a low SNR during 
expansion. 

In order to prevent waveform data used for speech synthesis from 
picking up noise during expansion, additional data 42 having an appropriate 
30 data length is added before and after compressed waveform data 48, and a 
starting point 45 for expansion and an ending point 46 for expansion are 
provisionally set. A data length of such additional data may be determined 
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in a frame unit, or in a sample unit or a pitch unit of a corpus, etc. 

Compressed data 49 is expanded in the dictionary data expansion 
part 15 as represented by (c) in Figure 4. The expanded waveform data 47 
used for speech synthesis can be obtained, maintaining a high SNR, whereas 
5 a leading point of the additional data 42 has a low SNR due to the influence of 
noise. Thus, by deleting the additional data while leaving a waveform data 
section 47 used for speech synthesis, expanded waveform data with a high 
SNR can be obtained. 

In the expansion position determining part 23, the starting point and 

10 the ending point of the port used for speech synthesis in the resultant 

expanded waveform data are matched with the starting point and the ending 
point of a section to be expanded, and in the SNR calculating part 24, an SNR 
between the expanded waveform data and the original waveform data is 
calculated, and the calculated results are sent to the expansion position 

15 determining part 23. 

In the expansion position determining part 23, calculated results of an 
SNR are obtained while changing a starting point and an ending point during 
expansion, whereby an expansion position with the highest SNR is obtained 
and stored as expansion position information. 

20 According to the method for automatically setting a starting point 

during expansion several pitches before the part used for speech synthesis, 
based on the phoneme label and the pitch information, an expansion position 
can be determined at a time in the expansion position determining part 23. 

Furthermore, according to the method for automatically calculating 

25 an ending point based on the number of bytes for bit filling found from the 

compression results and the starting position, thereby obtaining an expansion 
interval, in the expansion position determining part 23, an ending point is 
automatically calculated based on the number of bytes for bit filling and the 
starting point during expansion, and the interval thus obtained is determined 

30 as an expansion interval and stored as expansion position information. 

Furthermore, the compressed waveform data stored in the speech 
dictionary database 14 is expanded in the dictionary data expansion part 15 
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during speech synthesis, and reproduced in the expanded waveform data 
output part 16. Specifically, as shown in Figure 5, a speech synthesizing 
part 51 is provided, whereby a synthesized speech can be reproduced on a 
syllable basis. This will be described in more detail below. 
5 Figure 6 is a block diagram showing an example of a speech data 

compression/expansion apparatus of the present invention. First, the 
compression position determining part 22 and the expansion position 
determining part 23 are constituted as shown in Figure 6. More specifically, 
in the compression position determining part 22, reference numeral 221 

10 denotes a silence interval deleting part, 222 denotes a speech interval 

waveform generating part, and 223 denotes a compression interval setting 
part. In the expansion position determining part 23, reference numeral 231 
denotes a syllable extracting part, 232 denotes a syllable waveform section 
extracting part, 233 denotes an expansion interval setting part, and 234 

15 denotes an expansion interval and SNR storing part. 

First, it is assumed that waveform data of a corpus "I am keeping 
dogs" is stored in the speech dictionary database 14. A silence interval of the 
waveform data 111 is extracted and deleted, based on the phoneme label 112 
and the pitch information 113 in the silence interval deleting part 221. Then, 

20 a waveform only composed of a speech part is generated in the speech interval 
waveform generating part 222, and stored as waveform data 111. 

In the compression interval setting part 223, the entire speech 
interval from the beginning to the end of the corpus is specified, and the 
starting point and the ending point thereof are stored as the compression 

25 position information 144. The waveform data of the speech part in the 
corpus "I am keeping dogs" is compressed, and the result is stored as the 
compressed waveform data 141. 

In the dictionary data compression part 12, the waveform data of the 
speech part in the corpus "I am keeping dogs" is compressed, and the result is 

30 stored as the compressed waveform data 141. A new phoneme label and 
pitch information regarding the stored compressed waveform data are also 
stored in the speech dictionary database 14 as phoneme label 142 and the 
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pitch information 143. 

Furthermore, in setting an expansion interval, syllable parts of the 
corpus "I am keeping dogs" is extracted in the phoneme extracting part 231. 
More specifically, four syllable parts: "I", "am", "keeping", and "dogs" are 
5 extracted. 

Then, regarding each of the extracted syllables, a starting point and 
an ending point in the waveform data 111 before compression are detected for 
each syllable in the syllable waveform section extracting part 232. In the 
expansion interval setting part 233, a starting point and an ending point in 
10 the compressed waveform data 141 are provisionally set, based on the starting 
point and the ending point in the waveform data 111 before compression for 
each syllable. 

Various setting methods are considered as follows: a method in which 
a starting point or an ending point during expansion are set to be one to 

15 several frames before or after the starting point or the ending point in the 

required waveform data 111 before compression; a method in which a starting 
point or an ending point during expansion are set to be one to several samples 
before or after the starting point or the ending point in the required waveform 
data 111 before compression; a method in which a starting point or an ending 

20 point during expansion are set to be one to several pitches before or after the 
starting point or the ending point in the required waveform data 111 before 
compression; and the like. 

In the dictionary data expansion part 15, the expansion interval 
provisionally set in the expansion interval setting part 233 is actually 

25 expanded, and an SNR is calculated in the SNR calculating part 24 and stored 
in the expansion interval and SNR storing part 234. Interval data having 
the highest SNR in the data stored in the expansion interval and SNR storing 
part 234 is determined as an expansion interval, and the starting point and 
the ending point of the interval data are stored in the expansion position 

30 storing part 145. 

In actual expansion, when a syllable to be expanded is input, in the 
dictionary data expansion part 15, expansion is performed based on the 
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interval data stored in tlie expansion position storing part 145. Regarding 
the expanded waveform data, only a required part is cut to be used. 

Figure 7 is a block diagram showing another example of a speech data 
compression/expansion apparatus of the present invention. The structure of 
5 this apparatus is the same as that shown in Figure 6 except for the structure 
of the compression position determining part 22. Thus, the description of the 
expansion position determining part 23 is omitted here. In the compression 
position determining part 22, reference numeral 224 denotes a syllable 
extracting part and 225 denotes a compression interval and SNR storing part. 

10 In the same way as in Figure 6, it is assumed that waveform data of a 

corpus "I am keeping dogs" is stored in the speech dictionaiy database 14. In 
the silence interval deleting part 221, a silence interval of the waveform 
data 111 is extracted and deleted, based the phoneme label 112 and the pitch 
information 113. In the speech interval waveform generating part 222, a 

15 waveform composed of only a speech part is generated, and stored as 
waveform data 111. 

In the speech extracting part 224, syllable parts in a corpus "I am 
keeping dogs" are extracted. More specifically, four syllable parts: "I", "am", 
"keeping", and "dogs" are extracted. 

20 In the compression interval setting part 223, additional data is added 

before and after the starting point and the ending point of the waveform data 
before compression in each extracted syllable, for example, "dogs", as shown in 
Figure 4, a compression interval is provisionally set, and data in the 
compression interval is compressed in the dictionary data compression part 12. 

25 The compression method thereof is as described above. 

The compressed data is once expanded in the dictionary data 
expansion part 15, and an SNR between the expanded waveform data output 
from the expanded waveform data output part 16 and the waveform data 111 
before compression are calculated in the SNR calculating part 24, and stored 

30 in the compression interval and SNR storing part 225 together with the 
starting point and the ending point of the compression interval. 

Among the data stored in the compression interval and SNR storing 
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part 225, the section data with the highest SNR is determined as an 
expansion interval, and the starting point and the ending point of the section 
data are stored in the expansion position storing part 145. 

In actual expansion, when a syllable to be expanded is input, in the 
5 dictionary data expansion part 15, expansion is performed based on the 
interval data stored in the expansion position storing part 145. Regarding 
the expanded waveform data, only a required part is cut to be used. 

As described above, according to the present embodiment, a 
compression position and an expansion position in the waveform data can be 
10 determined based on the position having the highest SNR in speech synthesis, 
which enables high quality speech synthesis to be performed. 

Furthermore, since the capacity of waveform data to be compressed 
can be minimized to a required value. Therefore, an expansion time can be 
shortened, and a real time property of speech synthesis can be ensured. 
15 Next, a processing flow of a program realizing a speech data 

compression/expansion apparatus in the present embodiment will be 
described. Figure 8 shows a flow chart illustrating processing of a program 
realizing a speech data compression/expansion apparatus in the present 
embodiment. 

20 In Figure 8, when waveform data is extracted from an existing speech 

waveform dictionary or the like and input (Operation 81), a part to be used for 
speech synthesis in the waveform part is specified, and a starting point and an 
ending point for compression are provisionally set before and after the part to 
be used for speech synthesis (Operation 82). 

25 Next, the provisionally set compression section is compressed and 

expanded (Operation 83). If the quality of the expanded waveform data is 
high (Operation 84: Yes), the provisionally set compression interval is 
determined as a compression/expansion position (Operation 85) and 
registered in a database as waveform data used for speech synthesis 

30 (Operation 86). If the quality of the expanded waveform data is high 
(Operation 84: No), the compression position is provisionally set again 
(Operation 87), and the above-mentioned processing is repeated. 
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Examples of a recording medium storing a program realizing the 
speech data compression/expansion apparatus in the present embodiment 
include not only a portable recording medium 92 such as a CD-ROM 92-1 and 
a floppy disk 92-2, but also a storage device 91 provided at the end of a 
5 communication line and another storage device 94 such as a hard disk and a 
RAM of a computer 93, as shown in examples of a recording medium in 
Figure 9. In execution of the program, the program is loaded and executed 
on a main memory. 

Furthermore, examples of a recording medium storing compressed 

10 data and the Hke generated by the speech data compression/expansion 

apparatus in the present embodiment include not only a portable recording 
medium 92 such as a CD-ROM 92-1 and a floppy disk 92-2, but also a storage 
device 91 provided at the end of a communication line and another storage 
device 94 such as a hard disk and a RAM of a computer 93, as shown in 

15 examples of a recording medium in Figure 9. For example, the recording 
medium is read by a computer when the speech data compression/expansion 
apparatus of the present invention is used. 

As described above, according to the speech data 
compression/expansion apparatus of the present invention, a compression 

20 position and an expansion position in waveform data can be determined based 
on a position having the highest SNR during speech synthesis, which enables 
high quality speech synthesis to be performed. 

Furthermore, according to the speech data compression/expansion 
apparatus of the present invention, a capacity of waveform data to be 

25 compressed can be minimized to a required value; therefore, an expansion 
time can be shortened and a real time property of speech synthesis can be 
ensured. 

The invention may be embodied in other forms without departing from 
the spirit or essential characteristics thereof. The embodiments disclosed in 
30 this application are to be considered in all respects as illustrative and not 
limiting. The scope of the invention is indicated by the appended claims 
rather than by the foregoing description, and all changes which come within 
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the meaning and range of equivalency of the claims are intended to be 
embraced therein. 
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WHAT IS CLAIMED IS: 



1. A speech data compression/expansion apparatus, comprising: 

a dictionary data input part for extracting speech data containing 
5 waveform data from an existing speech waveform dictionary and inputting 
the extracted speech data; 

a compression position determining part for specifying a part used for 
speech synthesis in the waveform data, and setting a starting point and an 
ending point for compression before and after the part; 
10 a dictionary data compression part for compressing the waveform data 

with respect to a compression interval specified by the starting point and the 
ending point for compression; and 

a dictionary data expansion part for expanding the compressed 
waveform data, 

15 wherein the specified compression interval, in which an expansion 

result of the compressed waveform data has highest quality, is determined as 
a compression/expansion position, and the compressed waveform data, and 
the staring point and the ending point for compression are registered in a 
database as the waveform data used for speech synthesis. 

20 

2. A speech data compression/expansion apparatus according to claim 1, 
wherein, in the compression position determining part, the part used for 
speech synthesis in the waveform data is specified, and the starting point and 
the ending point for compression are provisionally set before and after the 

25 part, 

the apparatus further includes: 

a dictionary data compression part for compressing the waveform data 
with respect to the specified compression interval; 

a dictionary data expansion part for expanding the compressed 
30 waveform data; and 

an SNR calculating part for calculating an SNR with respect to the 
expanded waveform data, and 
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the specified compression interval, having a highest SNR, is 
determined as a compression/expansion position, and the compressed 
waveform data is registered in a database as the waveform data used for 
speech synthesis. 

5 

3. A speech data compression/expansion apparatus according to claim 1, 
further comprising an expansion position determining part for setting a 
starting point and an ending point for expansion before and after the 
compressed waveform data registered in a database as the waveform data 
10 used for speech synthesis, 

wherein the waveform data is expanded with respect to an expansion 
interval specified by the starting point and the ending point for expansion in 
the dictionary data expansion part. 

15 4. A speech data compression/expansion apparatus according to claim 1, 

wherein, in the compression position determining part, the starting point and 
the ending point for compression are determined in a pitch unit. 

5. A speech data compression/expansion apparatus according to claim 1, 

20 wherein, in the compression position determining part, the starting point and 
the ending point for compression are determined in a frame unit. 

6. A speech data expansion apparatus for expanding the waveform data stored 
in a database, compressed by the speech data compression/expansion 

25 apparatus, comprising: 

a dictionary data input part for extracting speech data containing 
waveform data from an existing speech waveform dictionary and inputting 
the extracted speech data; 

a compression position determining part for specifying a part used for 
30 speech synthesis in the waveform data, and setting a starting point and an 
ending point for compression before and after the part; 

a dictionary data compression part for compressing the waveform data 



21 



with respect to a compression interval specified by the starting point and the 
ending point for compression; and 

a dictionary data expansion part for expanding the compressed 
waveform data, 

5 wherein the specified compression interval, in which an expansion 

result of the compressed waveform data has highest quality, is determined as 
a compression/expansion position, and the compressed waveform data, and 
the staring point and the ending point for compression are registered in a 
database as the waveform data used for speech synthesis. 

10 

7. A speech data expansion apparatus for expanding the waveform data stored 
in a database, compressed by the speech data compression/expansion 
apparatus, comprising: 

a dictionary data input part for extracting speech data containing 
15 waveform data from an existing speech waveform dictionary and inputting 
the extracted speech data; 

a compression position determining part for specifying a part used for 
speech synthesis in the waveform data, and setting a starting point and an 
ending point for compression before and after the part; 
20 a dictionary data compression part for compressing the waveform data 

with respect to a compression interval specified by the starting point and the 
ending point for compression; and 

a dictionary data expansion part for expanding the compressed 
waveform data, 

25 wherein the specified compression interval, in which an expansion 

result of the compressed waveform data has highest quality, is determined as 
a compression/expansion position, and the compressed waveform data, and 
the staring point and the ending point for compression are registered in a 
database as the waveform data used for speech synthesis, and wherein, in the 

30 compression position determining part, the starting point and the ending 
point for compression are determined in a frame unit. 
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8. A speech data compression/expansion apparatus, comprising: 

a dictionary data input part for extracting speech data containing 
waveform data from an existing speech waveform dictionary and inputting 
the extracted speech data; 
5 a compression position determining part for specifying a part used for 

speech synthesis in the waveform data, and determining a compression 
position containing the part; 

a dictionary data compression part for compressing the waveform data 
with respect to the compression position; 
10 an expansion position determining part for setting a starting point 

and an ending point for expansion before and after the compressed waveform 
data; and 

a dictionary data expansion part for expanding the compressed 
waveform data with respect to an expansion interval specified by the starting 
15 point and the ending point for expansion, 

wherein the specified expansion interval, in which an expansion result 
of the compressed waveform data has highest quahty, is determined as an 
expansion position, and the compressed waveform data, and the starting point 
and the ending point for expansion are registered in a database as the 
20 waveform data used for speech synthesis. 

9. A speech data expansion apparatus for expanding the waveform data stored 
in a database, in which the expansion interval is determined by the speech 
data compression/expansion apparatus, comprising: 

25 a dictionary data input part for extracting speech data containing 

waveform data from an existing speech waveform dictionary and inputting 
the extracted speech data; 

a compression position determining part for specifying a part used for 
speech synthesis in the waveform data, and determining a compression 
30 position containing the part; 

a dictionary data compression part for compressing the waveform data 
with respect to the compression position; 
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an expansion position determining part for setting a starting point 
and an ending point for expansion before and after the compressed waveform 
data; and 

a dictionary data expansion part for expanding the compressed 
5 waveform data with respect to an expansion interval specified by the starting 
point and the ending point for expansion, 

wherein the specified expansion interval, in which an expansion result 
of the compressed waveform data has highest quahty, is determined as an 
expansion position, and the compressed waveform data, and the starting point 
10 and the ending point for expansion are registered in a database as the 
waveform data used for speech synthesis. 

10. A speech data compression/expansion apparatus according to claim 8, 
wherein, in the expansion position determining part, the starting point and 
15 the ending point for expansion are provisionally set before and after the 
compressed waveform data, 

the apparatus further includes: 

a dictionary data expansion part for expanding the compressed 
waveform data with respect to the specified expansion interval; and 
20 an SNR calculating part for calculating an SNR with respect to the 

expanded waveform data, 

wherein the specified expansion interval, having a highest SNR, is 
determined as an expansion position. 

25 11. A speech data compression/expansion apparatus according to claim 8, 

wherein, in the expansion position determining part, the starting point and 
the ending point for expansion are determined in a pitch unit. 

12. A speech data compression/expansion apparatus according to claim 8, 
30 wherein, in the expansion position determining part, the ending point for 
expansion is determined based on the number of bytes for bit filling and the 
starting point. 
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13. A speech data compression/expansion method, comprising: 

extracting speech data containing waveform data from an existing 
speech waveform dictionary and inputting the extracted speech data; 
5 specifying a part used for speech synthesis in the waveform data, and 

setting a starting point and an ending point for compression before and after 
the part; 

compressing the waveform data with respect to a compression interval 
specified by the starting point and the ending point for compression; and 
10 expanding the compressed waveform data, 

wherein the specified compression interval, in which an expansion 
result of the compressed waveform data has highest quality, is determined as 
a compression/expansion position, and the compressed waveform data, and 
the starting point and the ending point for compression are registered in a 
15 database as the waveform data used for speech synthesis. 

14. A speech data compression/expansion method, comprising: 

extracting speech data containing waveform data from an existing 
speech waveform dictionary and inputting the extracted speech data; 
20 specifying a part used for speech synthesis in the waveform data, and 

determining a compression interval including the part; 

compressing the waveform data with respect to the compression 
interval; 

setting a starting point and an ending point for expansion before and 
25 after the compressed waveform data; and 

expanding the compressed waveform data with respect to an 
expansion interval specified by the starting point and the ending point for 
expansion, 

wherein the specified expansion interval, in which an expansion result 
30 of the compressed waveform data has highest quahty, is determined as an 

expansion position, and the compressed waveform data, and the starting point 
and the ending point for expansion are registered in a database as the 
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waveform data used for speech synthesis. 

15. A speech data expansion system for expanding the waveform data stored 
in a database, compressed by the speech data compression/expansion 

5 apparatus, comprising: 

a dictionary data input part for extracting speech data containing 
waveform data from an existing speech waveform dictionary and inputting 
the extracted speech data; 

a compression position determining part for specifying a part used for 
10 speech synthesis in the waveform data, and setting a starting point and an 
ending point for compression before and after the part; 

a dictionary data compression part for compressing the waveform data 
with respect to a compression interval specified by the starting point and the 
ending point for compression; and 
15 a dictionary data expansion part for expanding the compressed 

waveform data, 

wherein the specified compression interval, in which an expansion 
result of the compressed waveform data has highest quality, is determined as 
a compression/expansion position, and the compressed waveform data, and 
20 the staring point and the ending point for compression are registered in a 
database as the waveform data used for speech synthesis. 

16. A speech data expansion system for expanding the waveform data stored 
in a database, compressed by the speech data compression/expansion 

25 apparatus, comprising: 

a dictionary data input part for extracting speech data containing 
waveform data from an existing speech waveform dictionary and inputting 
the extracted speech data; 

a compression position determining part for specifying a part used for 
30 speech synthesis in the waveform data, and setting a starting point and an 
ending point for compression before and after the part; 

a dictionary data compression part for compressing the waveform data 
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with respect to a compression interval specified by the starting point and the 
ending point for compression; and 

a dictionary data expansion part for expanding the compressed 
waveform data, 

wherein the specified compression interval, in which an expansion 
result of the compressed waveform data has highest quality, is determined as 
a compression/expansion position, and the compressed waveform data, and 
the staring point and the ending point for compression are registered in a 
database as the waveform data used for speech synthesis, and wherein, in the 
compression position determining part, the starting point and the ending 
point for compression are determined in a frame unit. 

17. A speech data expansion system for expanding the waveform data stored 
in a database, in which the expansion interval is determined by the speech 
data compression/expansion apparatus, comprising: 

a dictionary data input part for extracting speech data containing 
waveform data from an existing speech waveform dictionary and inputting 
the extracted speech data; 

a compression position determining part for specifying a part used for 
speech synthesis in the waveform data, and determining a compression 
position containing the part; 

a dictionary data compression part for compressing the waveform data 
with respect to the compression position; 

an expansion position determining part for setting a starting point 
and an ending point for expansion before and after the compressed waveform 
data; and 

a dictionary data expansion part for expanding the compressed 
waveform data with respect to an expansion interval specified by the starting 
point and the ending point for expansion, 

wherein the specified expansion interval, in which an expansion result 
of the compressed waveform data has highest quality, is determined as an 
expansion position, and the compressed waveform data, and the starting point 
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and the ending point for expansion are registered in a database as the 
waveform data used for speech synthesis. 



18. A computer-readable recording medium storing a program to be executed 
5 by a computer, the program comprising: 

extracting speech data containing waveform data from an existing 
speech waveform dictionary and inputting the extracted speech data; 

specifying a part used for speech synthesis in the waveform data, and 
setting a starting point and an ending point for compression before and after 
10 the part; 

compressing the waveform data with respect to a compression interval 
specified by the starting point and the ending point for compression; and 
expanding the compressed waveform data, 

wherein the specified compression interval, in which an expansion 
15 result of the compressed waveform data has highest quahty, is determined as 
a compression/expansion position, and the compressed waveform data, and 
the starting point and the ending point for compression are registered in a 
database as the waveform data used for speech synthesis. 

20 19. A computer-readable recording medium storing a program to be executed 
by a computer, the program comprising: 

extracting speech data containing waveform data from an existing 
speech waveform dictionary and inputting the extracted speech data; 

specifying a part used for speech synthesis in the waveform data, and 
25 determining a compression interval including the part; 

compressing the waveform data with respect to the compression 
interval; 

setting a starting point and an ending point for expansion before and 
after the compressed waveform data; and 
30 expanding the compressed waveform data with respect to an 

expansion interval specified by the starting point and the ending point for 
expansion, 
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wherein the specified compression interval, in which an expansion 
result of the compressed waveform data has highest quahty, is determined as 
an expansion position, and the compressed waveform data, and the starting 
point and the ending point for expansion are registered in a database as the 
waveform data used for speech synthesis. 
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ABSTRACT OF THE DISCLOSURE 



Speech data containing waveform data is extracted from an existing 
speech waveform dictionary and input. A part used for speech synthesis in 
5 the waveform data is specified, and a starting point and an ending point for 
compression are set before and after the part. The waveform data is 
compressed with respect to a compression interval specified by the starting 
point and the ending point for compression. The compressed waveform data 
is expanded, and the compression interval, in which an expansion result of the 
10 compressed waveform data has highest quality, is determined as a 

compression/expansion position. The compressed waveform data, and the 
starting point and the ending point for compression are registered in a 
database as waveform data used for speech synthesis. 
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Jeopardize the validity of the application 



r any patent issued 
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- Appfoiwd for im through a«V98. OMBOe51-0032 

^o^-- P«»**»ndT«i«n.ri<Ofroi:U.S. DEPARTMENT OF COMMERCE 

sr th« Pap»fworl< Reduction Act a« 1 99S. no fMnor* tn wquk»d to rwpond to « eell»e«on of Infomato i untoM it displaya a valid OMB control narrbmr. 

Japanese Language Declaration 



*fr:4^ ■■ «iliT3Ci?>«*fl#t LT. ^ttllBKiH-fi— ^?J« POWER OF ATTORNEY: As a n«iMd Invanfor, I hwaby at>point 
^«!#?rife1^?fS)ai-5*C*rLTaifT-^5#a±S^ittrtaA tn* following attom«y(s| an*of ag«nt(>) to prosecute ttiis 
kLX. T»S.<D^ir^^\,'^ftL-£-i-. {#ji±-. ifcii^ta application and transact all buaiiws* in the Patent and Trademirtt 

Offica connected thar awith flist nam* and rag/stratfon ntjmbar) 

James D. Halsey, Jr . 22,729. Harry John Saas, 22,010, David M. Pitcher, 25,908: John C. Garvey, 28.607;J. Randall Beckers, 

30.358: William F Herbert, 31 ,024; Richard A Gollhofer, 31, 106: Mark J. Henry, 36,162; Gene M. Gamer 11, 34. 172- Michae'j D 

Stein, 37.240: Paul 1. Kravetz. 35,230: Gerald P. Joyce. Ill, 37,64S;Todd E. Marleiie, 35,269; Harlan B Williams, Jr 34 756- 
George N Stevens. 36.938. Michael C. Soldner, 41.455. Norman L. Ourada, 4 1,235; Kevin R. Spivak. P-43, 148; and William M. 
■atsfii.w.l'^''^"'"' 35.348 (agent) 

SSlTsyT^ Sand Cofraspandencc to: 

STAAS & HALSEY 
700 Eleventh Street, N.W. 
Suite 500 

Washington. D.C. 20001 



iaS'!5^a*SA- : 3(r&t>'SS#-5-) Kract Taiaphona Call* to: (na/na tnd taltphon* numbar) 





Fui name of sole or first inventor 

Chikako MATSUMOTO 




r^'^JTy^ ^/vl3jjyt>,,^^ November 21. 200 3 


im 


Residence 

Kawasaki, Japan 


mm 


Cttizensiilp 

Japanese 




Post Otfice Address 

c/o FUJITSU LIMITED, 1-1, Kamikodanaka 4-chcmG . 


Nakahara-ku, Kawasaki-shi, Kanagawa 211-8588 


Iff-ftlSl » lll.fi- 


FuM name of second Joint inventor. If any 




Second Inventor's signature Date 






Slfl 


Citizenship 




Post Office Address 


. 



(3(5:=.jy.?* SJi^ISI -3 1, sr t, raSI! fC3a« L , (SuppJy limiUr kifomutlon and signature for lliird and subsequent 

i ^ i 3 joint invantorc.) 
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